信仰传播(BP)是针对图形模型的各种推理任务的重要消息算法,包括解决约束优化问题(COPS)。已经表明,BP可以通过在发送新消息(即抑制作用)之前将旧消息和新消息混合在各种基准测试中实现最先进的性能。但是,现有的调整BP静态阻尼因子的方法不仅在费力,而且损害其性能。此外,现有的BP算法在撰写新消息时平均处理每个变量节点的邻居,这也限制了其探索能力。为了解决这些问题,我们无缝地集成了BP,封闭式复发单元(GRU)和图形注意网络(GATS),以推理构成新的BP消息的动态权重和阻尼因子,以推理有关动态权重和阻尼因子。我们的模型,深切的信念传播(DABP),将因子图和每次迭代中的BP消息作为输入,并通过GRUS和GATs渗透最佳权重和阻尼因子,然后是多头注意力层。此外,与现有的基于神经的BP变体不同,我们提出了一种新颖的DABP的自我监督学习算法,其解决方案成本不需要昂贵的培训标签,并且还可以通过有效的在线学习避免常见的分发问题。广泛的实验表明,我们的模型大大优于最先进的基线。
translated by 谷歌翻译
自动识别基础心脏异常的结构底物可以潜在地为介入程序提供实时指导。有了心脏组织底物的了解,可以通过检测心律不齐的底物来进一步优化复杂的心律不齐和心室心动过速等复杂的心律不齐和心室心动过速。光学相干断层扫描(OCT)是一种实时成像方式,有助于满足这一需求。心脏图像分析的现有方法主要依赖于完全监督的学习技术,这些技术遇到了在像素标签的劳动密集型注释过程中工作量的缺点。为了减少对像素标签的需求,我们使用人类心脏底物的OCT图像上的图像级注释开发了一个两阶段的深度学习框架,用于心脏脂肪组织分割。特别是,我们将类激活映射与超像素分割整合在一起,以解决心脏组织分割中提出的稀疏组织种子挑战。我们的研究弥合了自动组织分析的需求与缺乏高质量像素的注释之间的差距。据我们所知,这是第一项尝试通过弱监督的学习技术来解决OCT图像上心脏组织分割的研究。在体外人类心脏OCT数据集中,我们证明了我们对图像级注释的弱监督方法可与对像素式注释进行训练的完全监督方法相当。
translated by 谷歌翻译
现有的等分性神经网络需要先前了解对称组和连续组的离散化。我们建议使用Lie代数(无限发电机)而不是谎言群体。我们的模型,Lie代数卷积网络(L-Chir)可以自动发现对称性,并不需要该组的离散化。我们展示L-CONC可以作为构建任何组的建筑块,以构建任何组的馈电架构。CNN和图表卷积网络都可以用适当的组表示为L-DIV。我们发现L-CONC和物理学之间的直接连接:(1)组不变损失概括场理论(2)欧拉拉格朗法令方程测量鲁棒性,(3)稳定性导致保护法和挪威尔特。这些连接开辟了新的途径用于设计更多普遍等级的网络并将其应用于物理科学中的重要问题
translated by 谷歌翻译
最近,对瓷砖修剪进行了广泛的研究,以加速深度神经网络(DNNS)的推断。但是,我们发现,在训练有素的DNN上,由于瓷砖修剪而造成的损失可以消除重要元素,并消除重要元素。在这项研究中,我们提出了一种称为tiletrans的单发重新聚集方法,以减少瓷砖修剪的损失。具体而言,我们重复了权重矩阵的行或列,以使模型体系结构在重新聚集后可以保持不变。这种再生能够实现DNN模型的重新聚集,而无需进行任何重新培训。提出的重新聚集方法将重要元素结合到同一瓷砖中。因此,保留瓷砖修剪后的重要元素。此外,可以将tiletrans无缝集成到现有的瓷砖修剪方法中,因为它是在修剪之前执行的预处理方法,这与大多数现有方法正交。实验结果表明,我们的方法对于减少DNN上的瓷砖修剪的损失至关重要。具体而言,Alexnet的精度提高了17%,而Resnet-34的精度为5%,其中两种模型均在Imagenet上进行了预训练。
translated by 谷歌翻译
分布式约束优化问题(DCOPS)是组合优化问题的重要子类,其中信息和控件分布在多个自主代理中。此前,通过学习有效启发式,机器学习(ML)基本上应用于解决组合优化问题。然而,现有的基于ML的启发式方法通常不完全到不同的搜索算法。最重要的是,这些方法通常需要全面了解要解决的问题,这不适合分布式设置,其中由于地理限制或隐私问题,集中化并不逼真。为了解决一般性问题,我们提出了一种用于DCOPS的新型针对性的非循环图表示模式,并利用图表注意网络(GATS)来嵌入图形表示。我们的模型GAT-PCM,然后以离线方式使用最佳标记的数据来预先预订,以构建有效的启发式,以提高广泛的DCOP算法,其中评估部分分配的质量至关重要,例如本地搜索或回溯搜索。此外,为了实现分散的模型推断,我们提出了一个GAT-PCM的分布式嵌入式模式,其中每个代理只交换嵌入的向量,并显示其声音和复杂性。最后,我们通过将其与本地搜索或回溯搜索算法组合来展示我们模型的有效性。广泛的经验评估表明,GAT-PCM升级算法显着优于各种基准中的最先进的方法。预磨料模型可在https://github.com/dyc941126/gat-pcm上获得。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译